COMPUTER  SCIENCE 
TECHNICAL  REPORT  SERIES 

UNIVERSITY  OF  MARYLAND 

COLLEGE  PARK,  MARYLAND 
20742 


r*V 


irpttblle  release  I 


SStrmtion  uallnited. 


81  12  14  088 


f>J  <_/ 
O  fl> 


Technical  Report  TR-1115 


October#  1931 
F49620-80-C-«a* 

- 000  l 


A  Heuristic  For 
Deriving  Loop  Functions* 


,  Douglas  D.  Dunlop  and  Victor  R.  Basil! 

I 


i 


Department  of  Computer  Science 
University  of  Maryland 
College  Park,  MD  2C742 


V. 


JURFOS^XOFFTCS  0?  SCIENTIFIC  '  \  'SC) 

N0r,' I C£  OF  7R.f 1? r::.!TTTAL  TODTIC 
This  technical  report  has  been  revi  >  •  • .  '  :  is 

approved  for  public  release  I  AW  AF?.  133 -1C. 

Distribution  is  unlimited. 

KATTHBW  J.  KCTPER 

Chief,  Technical  Information  Division 


*This  -.’Ork  vas  supported  in  part  by  the  A4  r  Force  office  n 
Scientific  Research  Contract  ^^■>-F-53c2C-3G-c-#jC1  to  the  Uniter 
sity  of  Maryland,  The  material  contained  in  this  paper 

ccme  part  of  a  dissertation  to  be  submittal  to  the  Cra3uat 
ho cl ,  University  of  Maryland,  by  Douglas  r.  Duni’cp,  in  part  in 
''i1  of  the  requirements  for  the  Ph.p.  degree  in  Compute 


Cc  i  :"C3. 

Copyright  13  81  by  T.c.  Durlcp  and  V .  *v 


n  i 


/ 


yo 


*  (  / 


J 


▼ 


ABSTRACT 


The  problem  of  analyzing  an  initialized  loop  and  verifying 
that  the  program  computes  some  particular  function  of  its  inputs 
is  addressed.  A  heuristic  technique  for  solving  these  problems 
is  proposed  which  appears  to  work  well  in  many  commonly  occurring 
cases.  The  use  of  the  technique  is  illustrated  with  a  number  of 
applications.  A  hierarchy  of  initialized  loops  is  suggested 
which  is  based  on  the  "effort"  required  to  apply  this  methodology 
in  a  deterministic  (i.e.  guaranteed  to  succeed)  manner.  It  is 
explained  that  in  any  case,  the  success  of  the  proposed  heuristic 
relies  on  the  loop  exhibiting  a  "reasonable"  form  of  behavior. 
An  informal  categorization  of  such  programs  is  made  which  is 
based  on  two  opposing  problem  solving  strategies.  It  is  sug¬ 
gested  that  our  heuristic  is  naturally  suited  for  use  on  programs 
in  one  of  these  categories. 


HEY'-JORDS  and  PHRASES:  program  verification, 
grams,  loop  functions,  constraint  functions 
grams 

CP.  CATEGORIES:  5.24 
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1.  Introduction 


In  this  report,  we  will  consider  programs  of  the  following 


forms 


< INITIALIZATION  STATEMENTS> 
while  <LOOP  PREDICATE>  do 
<LOOP  BODY  STATEMENTS > 
od . 


These  programs  tend  to  occur  frequently  in  programming  in  order 
to  accomplish  some  specific  task,  e.g.  sort  a  table,  traverse  a 
data  structure,  calculate  some  arithmetic  function,  etc.  More 
precisely,  the  intended  purpose  of  such  a  program  is  often  to 
compute,  in  some  particular  output  variable(s),  a  specific  func¬ 
tion  of  the  program  inputs.  In  this  paper,  we  address  the  prob¬ 
lem  of  analyzing  a  program  of  the  above  form  in  order  to  prove 

its  correctness  relative  to  this  intended  function. 

One  common  strategy  taken  to  solve  this  problem  is  to  heu- 

r istically  synthesize  a  sufficiently  strong  inductive  assertion 

(i.e.  loon  invariant  [tloare  6?])  for  proving  the  correctness  of 

the  program.  A  large  number  of  techniques  to  aid  in  the  discovery 

of  these  assertions  have  appeared  in  the  literature  (see,  for 

example,  [Tegbreit  74,  Fatz  &  Manna  75]).  it  is  our  view,  ho w- 

'  * 

ever,  that  these  techniques  seen  to  be  more  "machine  oriented" 
than  "people  ori^r.te'5."  That  is,  they  seem  geared  toward  use  in 
an  assertion  generator  for  an  automatic  program  verification  sys¬ 
tem.  Furthermore,  a  sizable  portion  of  th**  cone!  e::ity  of  these 


ocso  nature 


ethodcl - 
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ogy  proposed  here  is  intended  to  be  used  by  programmers  in  the 
process  of  reading  (i.e.  understanding,  documenting,  verifying, 
etc.)  programs  and  is  tailored  to  the  commonly  occurring  verifi¬ 
cation  problem  discussed  above. 

An  alternative  to  the  inductive  assertion  approach  which  is 
taken  in  this  paper  is  to  invent  an  hypothesis  concerning  the 
general  input/output  behavior  of  the  WHILE  loop.  Once  this  has 
been  done,  the  loop  can  be  pr oven/d ispr oven  correct  with  respect 
to  the  hypothesis  using  standard  techniques  [Mills  72,  Mills  75, 
3asu  &  Ilisra  75,  Morris  &  Wegbreit  77,  Wegbreit  77,  Misra  73]. 
If  the  hypothesis  is  shown  to  be  valid,  the 
correctness/incorrectness  of  the  program  in  question  follows 
immediately.  It  has  been  shown  [Basu  &  Misra  75,  Misra  73,  Misra 
79,  Basu  80]  that  this  loop  hypothesis  can  be  generated  in  a 
deterministic  manner  (i.e.  one  that  is  guaranteed  to  succeed)  for 
two  restricted  classes  of  programs.  The  approach  suggested  here 
is  similar  to  this  method  in  that  the  same  type  of  loop  behavior 
seems  to  be  exploited  in  order  to  obtain  the  hypothesis.  Cur 
approach  is  not  deterministic  in  general,  but  as  a  result,  is 
intended  to  be  more  widely  applicable  and  easier  to  use  than 
those  previously  proposed  in  the  literature. 

One  view  of  the  problem  of  discovering  the  genera'' 
input/output  behavior  of  the  WHIL3  loco  under  consideration  might 
be  to  study  it  and  make  a  guess  about  "hat  it  ’oes.  One  might  go 
about  doing  this  by  "executing"  the-  loop  by  hand  on  several  sam¬ 
ple  inputs  ar.d  then  guessing  some  general  expression  for  the 
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input/output  behavior  of  the  loop  based  on  these  results.  Deci¬ 
sions  that  need  to  be  made  when  using  such  a  technique  include 
how  many  sample  inputs  to  use,  how  should  these  inputs  be 
selected,  and  how  should  the  general  expression  be  inferred. 
Another  consideration  is  that  hand  execution  can  be  a  difficult 
and  an  error  prone  task.  Indeed,  it  seems  that  the  loops  for 
which  hand  execution  can  be  carried  out  in  a  straightforward 
manner  are  the  ones  that  are  least  in  need  of  verification  or 
some  other  type  of  formal  analysis. 

Our  methodology  is  similar  to  this  technique  in  that  we 
attempt  to  infer  the  general  behavior  of  the  loop  from  several 
sample  loop  behaviors.  In  contrast  to  this  technique,  however, 
the  sample  behaviors  are  not  obtained  from  hand  execution,  rather 
they  are  obtained  from  the  specification  for  the  initialized  loop 
program.  In  many  of  the  cases  we  have  studied,  the  general 
behavior  of  the  loop  in  question  is  quite  easy  to  guess  from 
these  samples.  This  is  not  to  say  that  the  loop  computes  a  "sim¬ 
ple"  function  of  its  inputs  or  that  the  loop  necessarily  operates 
ir.  a  "simple"  manner.  Much  more  accurately,  the  ease  with  which 
the  general  behavior  car.  be  inferred  from  the  samples  is  due  to  a 
"simple"  connection  between  a  change  in  the  input  value  of  an 
initialized  variable  and  the  corresponding  change  caused  in  the 
result  produced  by  the  loop.  07e  will  expand  on  this  idea  in  what 
follows • 
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2.  The  Technique 

In  order  to  describe  the  proposed  technique,  we  represent 
the  verification  problem  discussed  above  as  follows: 

(X  6  D(f) } 

X  :=  K (X) ; 
while  B(X)  do 
X  :  =  H (X) 
od 

{v=f (XO) } . 

In  this  notation,  X  represents  the  data  state  of  the  program.  The 
symbols  K  and  H  are  data  state  to  data  state  functions 
corresponding  to  the  effects  of  the  initialization  and  loop  body 
respectively.  The  function  3  is  a  predicate  over  the  data  state. 
The  program  is  specified  to  produce  in  the  variable  v  a  function 
f  of  the  input  data  state  XO.  The  notation  D(f)  appearing  in  the 
program  precondition  is  the  domain  of  the  function  f,  i.e.  the 
set  of  states  for  which  f  is  defined. 


If  D  is  the  set  of  all  possible  program  data  states  and  T  in 
tw *  ~et  of  values  that  the  variable  v  may  assume,  the  specifica¬ 
tion  function  f  has  the  functionality  f  :  0  ->  T.  In  order  to 
verify  a  program  of  this  form,  v/e  choose  to  find  a  function  g  :  r 


ich  dascrines  the  input/output  characteristics  of  the 
loop  ever  a  suitably  general  input  domain.  Specifically, 


put  domain  must  bn  large  enough 
date  states  generated  as  the 


to  contain  a?.  1  the  inter¬ 
loco  iterates.  If  this  is 
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We  briefly  consider  two  alternative  approaches  to  synthesiz¬ 
ing  this  loop  function  g.  The  alternatives  correspond  to  the 
"top  down"  and  "bottom  up"  approaches  to  creating  inductive 
assertions  discussed  in  [Katz  &  Kanna  73,  Ellozy  81].  In  the 
"top  down"  alternative,  the  hypothesis  g  answers  the  question 
"what  would  the  general  behavior  of  the  loop  have  to  be  in  order 
for  the  program  to  be  correct?"  If  such  an  hypothesis  can  be 
found  and  verified,  the  correctness  of  the  program  is  esta¬ 
blished.  If  the  program  is  incorrect,  no  such  valid  hypothesis 
exists.  In  the  "bottom  up"  alternative,  the  hypothesis  g  answers 
the  question  "what  is  the  general  behavior  of  the  loop?"  In  this 
case,  a  valid  hypothesis  always  exists.  Cnee  it  has  been  found 
and  verified,  the  program  is  correct  if  and  only  if  the  initiali¬ 
zation  followed  by  g  is  equivalent  to  the  function  f. 

The  advantage  of  a  "top  down"  approach  is  that  it  is  usually 
easier  to  apply  in  practice  because  the  verifier  has  more  infor¬ 
mation  to  work  with  when  synthesizing  the  hypothesis.  The  disad¬ 
vantage  of  such  an  approach  is  that  it  may  not  be  as  well-suited 
to  disproving  the  correctness  of  programs.  This  is  because  to 
disprove  a  program,  the  verifier  must  employ  an  argument  which 
shows  that  there  does  not  exist  a  valid  hypothesis.  The  method 
described  in  this  paper  is  based  on  the  "top  down"  approach.  :7e 
will  return  to  a  discussion  of  this  advantage  an^  disadvantage 
later. 


,7o  begin  by  assuming  the  program,  in  question  is  correct  with 


respect  to  its  specification,  he  then  consider  several  properties 
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of  the  function  g  which  result  from  this  assumption.  First,  the 
correctness  of  the  program  implies 

(1)  XO  6  D  ( f )  ->  f  (X0)=g(K(X0)  )  . 

That  is,  for  inputs  satisfying  the  program  precondition,  the  ini¬ 
tialization  followed  by  the  loop  yields  the  desired  result. 
Secondly,  since  the  loop  computes  g, 

3 (XO)  ->  g (XO) =g (H(X0) ) 

holds  by  the  "iteration  condition"  [Misra  73]  of  the  standard 
technique  for  showing  the  loop  computes  g.  This  implies 
3 (K (XO) )  ->  g (K (XO) ) =g (H(K (XO) ) ) . 

Combining  with  (1)  yields 

(2)  XO  €  D(f )  ,  B  (K  (XO)  )  ->  f  (XO)  =g  (II (K  (XO)  )  )  . 

At  this  point  we  choose  to  introduce  an  additional  universally 
quantified  state  variable  X  into  each  of  (1)  and  (2) .  The 
results  are  the  equivalent  conditions 

( i" )  ::o  e  n(f),  x=x(xo  ->  g(x)=f(xo) 

end 

(2')  XO  5  D  ( f )  ,  3  (K  (XO)  )  ,  X=H  (K  (XO )  )  ->  g(X)*f(XC). 

'•'•a  summarize  by  saying  that  if  the  program  is  correct  with 
respect  to  its  specification,  conditions  (l")  and  (2')  hold. 


Suppose  no’>  that  the  specification  (f) 
behavior  of  the  initialization  (X) ,  loco 
body  (:•«:)  are  known.  Given  this,  (!')  and  ( 


,  and  the  inpu 
predicate  (B) 
2")  can  be 


t/output 
2nd  loop 


solve  for  the  loop  hypothesis  g  on 
in j  the  correctness  of  the  program 
thought  of  as  defining  portions  of 


Indeed,  (!')  and  (2')  c 
the  unknown  1ooc  function 


V’O 
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are  seeking.  Specif ically,  each  of  (1")  and  (2")  can  be  viewed 
as  defining  a  function  g  with  a  restricted  domain.  In  this 
light,  for  example,  (1")  defines  the  function  (i.e.  set  of 
ordered  pairs) 

g  *  {(X,Z)  |  TE  XO  6  D(f)  ST  (X=K  (XO)  &  Z=f(X0))}. 

We  call  (1')  and  (2")  constraint  functions  since  they  are  func¬ 
tions  and  serve  as  constraints  (i.e.  requirements)  on  the  general 
loop  function.  More  precisely  put,  the  constraint  functions  are 
subsets  of  the  general  loop  function.  The  hope  is  that  if  these 
subsets  are  representative  of  the  whole,  the  general  loop  func¬ 
tion  may  be  inferred  through  analysis  of  the  constraint  func¬ 
tions  . 


In  what  follows  we  describe  a  4  step  process  for  construct¬ 
ing  a  general  loop  function  g  from  these  constraint  functions. 
We  suggest  that  the  reader  not  be  taken  aback  by  what  may  appear 
to  be  considerable  complexity  in  the  description  of  cur  tech¬ 
nique.  Me  intentionally  have  attempted  to  describe  the  procedure 
in  a  careful,  precise  manner.  Furthermore,  the  technique  is 
based  on  a  few  simple  ideas  and,  or.ce  those  ideas  have  been 
learned,  we  feel  it  can  be  applied  with  a  considerable  amount  of 
success . 


3xam.pl e  1  -  As  we  describe  these  steps,  we  will  illustrate 
their  application  on  the  following  trivial  program,  to  compute 
multiplication: 


-7- 
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{v>=0} 
z  :  =  0 ; 

while  v  /  0  do 
z  :=  z  +  FT 
v  :=  v  -  1 
od 

{ z=v(J*k} . 


vje  proceed  with  the  example  analysis  as  follows. 


Ste?  1  :  RECORD  -  The  first  step  consists  of  recording  the 
constraint  functions  (copied  from  (1")  and  (2')) 

Cl:  XO  6  D(f),  X=K  (XO)  ->  g(X)=f(X0) 

C2:  XO  6  D  (f )  ,  3(K(X0)),  X=H(K(X0))  ->  g(X)*f(X0) 

As  a  notational  convenience,  we  dispense  with  the  data  state 
notation  and  use  program  variables  (possibly  subscripted  by  0  to 
denote  their  initial  values)  in  these  function  definitions.  The 
terms  XO  <=  D(f)  and  f(XO)  come  from  the  pre  and  post  conditions 
for  the  initialized  loop  respectively.  The  term  X=X(X0)  is  based 
on  the  input/output  behavior  of  the  initialization,  and  the  terms 
3(1’.  (XO))  and  X=H(R(X0))  together  describe  the  input/output 
behavior  of  the  initialization  followed  by  exact.1  y  1  loop  itera¬ 
tion.  "Je  illustrate  these  ideas  with  the  multiplication  program 
in  example  1.  The  constraint  functions  for  this  program  are  as 
folic-  -3 : 


Cl:  v  0  >  =  0 ,  v*v3,  z  =  3  ->  g { z ,v ,k) =vG*k 
C2:  v0>0,  v=v 0— 1 ,  z=k  ->  g ( z ,v, k) =v0*k . 

•'e  make  the  following  comments  concerning  these 

til  Fi.r  Ft/  In  tiv5'  i  n  h  o  r  OC  f-  q  r  C  i  1  -?  0  1  f-  *r 

tho  w  ;)if  ?ctrt  of  tho  i  n ;  t  i  o  1  i 7.  c  t  io r.  or  1  o  00  f 
(  i  .  0  .  vo  f  i  s  n  0  r.  n  o  with  ~k  3  an«*  t^.c  rw 1 


function  raf  ini- 

V’3  *Q  00 1 
On  t  '’pp  r  h  / 

for  ?  symbol  kl)  . 
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Secondly,  g  is  defined  as  a  function  of  each  program  variable 
which  occurs  in  the  loop  predicate  or  loop  body.  That  is,  g  is  a 
function  of  the  variables  on  which  the  behavior  of  the  loop 
directly  depends.  Furthermore,  note  that  in  C2,  the  term  v0>0 
captures  both  XO  S  D(f)  (i.e.  v0>=0)  and  B(K(X))  (i.e.  vO^O) .  As 
a  final  remark,  in  a  constraint  function  we  will  use  the  phrase 
domain  requirement  to  refer  to  the  collection  of  terms  to  the 
left  of  the  " ->”  symbol  and  function  expression  to  refer  to  the 
expression  which  defines  the  value  of  g  (e.g.  vQ*k  in  both  Cl  and 
C2  above) . 


Step  2  :  SIMPLIFY  -  All  variables  which  appear  in  the  func¬ 
tion  definition  but  not  in  the  parameter  list  for  g  must  eventu¬ 
ally  be  eliminated  from  the  definition.  On  occasion,  it  is  pos- 


sible  to 

solve 

for  the  value 

of  such  a  var 

iable  in  the  domain 

r equ i r  ensn 

t  and 

substitute  the 

equivalent 

expression  for  it 

throughout 

the 

definition.  To 

illustrate,  i 

n  the  definition  Cl 

aoove,  vC  is  a  candidate  for  elimination.  r,'e  know  its  value  as  a 
function  of  v  (i.e.  vO=v) ,  hence  we  can  SIMPLIFY  this  definition 
to 


v>=C,  z-G  ->  g(z,v,k)»v 


Mote  that 

the 

term.  v=vC 

has 

disappeared  since  with  the 

sufcstitu- 

tion  it 

is 

equivalent 

to 

TP.UE .  In  a  similar  manner. 

tho  second 

constraint 

f  M 

notion  can 

he 

SIMPLIFIED  to  (using  vC=v+l) 

12:  v>  =  D ,  z=k  ->  g  ( 2 , v , k )  =  (v * 1) *k . 

Although  applying  this  simplifying  heuristic  is  most  often  a 
straightforward  process,  care  must  he  taken  to  insure  that  the 


A  Heuristic  For  Deriving  Loop  Functions 


domain  of  the  constraint  function  is  not  mistakenly  extended. 
For  example ,  if  d  and  dO  are  integer  variables,  the  definition 
d0>0,  d=d0*2  ->  g  (d) =d0*8 
does  not  SIMPLIFY  to 
d>0  ->  g (d) =d*4 

since  the  first  function  defines  a  value  of  g  only  for  positive, 
even  values  of  d  while  the  second  definition  defines  a  value  of  g 
for  all  positive  d.  The  first  function  does  SIMPLIFY  to 
d>0,  SVEN(d)  ->  g  (d) =d*4 

where  EVEN ( d )  is  a  predicate  whiTh  is  TRUE  iff  d  is  even. 


Step  2  :  TRITE  -  Variables  which  appear  in  the  parameter 
list  for  g  but  not  in  the  function  expression  of  its  definition 
are  candidates  to  be  introduced  into  the  function  expression. 
Each  of  these  variables  will  be  bound  to  a  term  in  the  domain 


requirement  of  the  definition.  The  purpose  of  this  step  is  to 
rewrite  the  function  expression  of  C2  (based  on  the  properties  of 
the  operation  (s)  involved.)  in  order  to  introduce  these  terms  into 
the  function  expression.  To  illustrate,  consider  the  above  SIM¬ 
PLIFIED  C2  definition.  The  variable  z  is  a  candidate  to  be 
introduced  into  the  function  expression  (v+l)*k.  It  is  bound  to 
the  term  '<  in  the  domain  requirement.  Thus  we  need  to  introduce 
an  additional  term  k  into  this  function  expression.  One  way  to 
do  this  is  to  translate  the  expression  to  v*k+k.  Eased  on  this, 
we  REWRITE  C2  as 


•/>■( 


z  =  ';  ->  g(z ,v,’;)=v 


*  Z 
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Step  £  :  SUBSTITUTE  -  In  steps  2  and  3,  the  constraint  func¬ 
tions  are  massaged  into  equivalent  definitions  in  order  to  facil¬ 
itate  step  4.  The  purpose  of  this  step  is  to  attempt  to  infer  a 
general  loop  function  from  these  constraints.  We  motivate  the 
process  as  follows.  Suppose  we  are  .searching  for  a  particular 
relationship  between  several  quantities,  say  E,  m  and  c.  Furth¬ 
ermore,  suppose  that  through  some  form  of  analysis  we  have  deter¬ 
mined  that  when  m  has  the  value  17,  the  relationship  E=17*(c**2) 
holds.  A  reasonable  guess,  then,  for  a  general  relationship 
between  E,  m  and  c  would  be  E=m*(c**2).  This  would  be  particu¬ 
larly  true  if  we  had  reason  to  suspect  that  there  was  a  rela¬ 
tively  simple  connection  between  the  quantities  m  and  E.  T7e 
arrived  at  the  general  relationship  by  substituting  the  quantity 
m  for  17  in  the  relationship  which  is  known  to  hold  when  m  has 
the  value  17.  Viewed  in  this  light,  the  purpose  of  the  con¬ 
straint  function  C2  is  to  obtain  a  relationship  which  holds  for  a 
specific  value  of  m  (e.g.  17).  The  step  BETF.IT2  exposes  the  term 
17  in  this  relationship.  Finally,  SUBSTITUTE  substitutes  n  for 
17  in  the  relationship  and  proposes  the  result  as  a  general  rela¬ 
tionship  between  E ,  n  and  c.  In  terms  of  the  multiplication  pro¬ 
gram  being  considered,  the  "SUBSTITUTE  step  calls  cor  replacing 
one  of  the  terms  k  in  the  above  rewritten  function  expression 
with  the  term  z.  The  two  possible  substitutions  lead  to  the  fol¬ 
lowing  general  functions: 

v>  =  C  ->  g  (z,v,k)  =v*'c+3 

and 

v>=0  ->  g  (z,v,k)  sv-^z-1-':. 
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Both  of  these  (necessarily)  are  generalizations  (i.e.  supersets) 
of  C2,  however,  only  the  first  is  also  a  generalization  of  Cl. 
Hence  this  function  is  hypothesized  as  a  description  of  the  gen¬ 
eral  behavior  of  the  above  WHILE  loop. 

We  have  applied  the  above  4  steps  to  obtain  an  hypothesis 
for  the  behavior  of  the  loop  in  question.  Since  this  description 
is  sufficiently  general  (specifically,  since  the  loop  is  closed 
for  the  domain  of  the  function) ,  we  can  prove/disprove  the 
correctness  of  the  hypothesis  using  standard  verification  tech¬ 
niques  (Mills  75,  Misra  73].  Specifically,  the  hypothesis  is 
valid  if  and  only  if  each  of 

-  the  loop  terminates  for  all  v>=0, 

-  v=0  ->  z=z  +  v*k ,  and 

-  z  +  v*k  is  a  loop  constant  (i.e.  vO*kO=z  +  v*k  is  a  loop 

invariant) 

hold.  We  remark  that  the  loop  hypothesis  is  selected  in  such  a 
way  that  if  it  holds  (i.e.  the  loop  does  compute  this  general 
function) ,  the  initialized  loop  is  necessarily  correct  with 
respect  to  f. 


We  emphasize  that  there  are  usually  an  infinite  number  of 
generalizations  of  the  constraint  functions  Cl  and  C2,  and  that, 
depending  on  how  RE'JEITE  and  SUBSTITUTE  are  applied,  the  tech¬ 


nique  is  capable  of  generating  any 
For  example,  RZ'IRITE  and  SUBSTITUTE 
example  could  have  produced 


one  of  these  generalizations, 
applied  to  the  multiplication 


C2:  v  >  *  3  ,  z=k  ->  g(z,v,k)» 
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v*k  +  3*k  +  k*k* (v-7) / (4*k)  +  k*k*k/(k*k) 

-  k*k*k* (v-7)/(4*k*k)  -  k*k*k*3/(k*k) 

and 

v>=0  ->  g(z,v,k)» 

v*k  +  3*z  +  z*z* (v-7) /(4*k)  +  z*z*z/(k*k) 

-  z*z*z* (v-7)/{4*k*k)  -  z*z*z*3/(k*k) 
respectively,  where  */"  denotes  an  integer  division  (with  trunca¬ 
tion)  infix  operator  which  yields  0  when  its  denominator  is  0. 
This  last  function  is  also  a  generalization  of  Cl  and  C2. 

It  has  been  our  experience,  however,  that  many  initialized 
loops  occur  in  which  there  exists  some  relatively  simple  connec¬ 
tion  between  different  input  values  of  the  variables  constrained 
by  initialization  and  the  corresponding  result  produced  by  the 
rHILE  loop.  Most  often  in  practice,  these  variables  are  bound  to 
values  in  the  domain  requirement  of  C2  which  suggest  an  applica¬ 
tion  of  REUNITE  that  uncovers  this  relationship  and  leads  to  a 
correct  hypothesis  concerning  the  general  loop  behavior.  In  the 
following  section  we  illustrate  a  number  of  example  applications 
of  this  technique. 

2.  Applications 


Example  2  -  The  following  program  computes  integer  exponen¬ 
tiation  .  This  example  serves  to  illustrate  the  use  of  the  tech¬ 
nique  when  the  loop  body  contains  several  paths: 


_ i  ■)_ 
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{d>=0} 
w:  *1 ; 

while  d  /  0  do 

Tf  odd(d)  then  w  w  *  c  f i ; 
c  :=  c*c?  3  7=  d/2 
od 

{w=cU  A  dO} . 


The  infix  operator  A  appearing  in  the  postcondition  represents 
integer  exponentiation.  The  first  constraint  function  is  easily 
obtained : 

d0>=0,  c-cO,  d=dO,  w=l  ->  g (w,c ,d) =cOAdO 
and  SIMPLIFIES  to 
Cl:  d>=0,  w=l  ->  g (w,C,d) =cAd. 

Since  there  exist  two  paths  through  the  loop  body,  we  will  obtain 
two  second  constraint  functions.  The  first  of  these  deals  with 
the  path  which  updates  the  value  of  w  and  is  executed  when  the 
input  value  of  d  is  odd.  The  function  is 

dQ>0,  odd(dO),  w=cO,  c=cO*cO,  d=d0/2  ->  g (w,c,d) =cCAdO 
which  SIMPLIFIES  to 

C2a:  J>  =  0,  c=w*w  ->  g (w,c,d) =wA (d*2+l)  . 

The  function  corresponding  to  the  other  loop  body  path  is 
•dC>0,  even(dC),  w=i ,  c=cO*cO,  d=d0/2  ->  g  (w,c  ,d)  =cOAd.O 
and  SIMPLIFIES  to 

d>=0 ,  w=l,  SOUARE(c)  ->  g (w,c,d) =SQRT(c) A (d*2) 

i  ,e . 


C2b:  d>=0,  w=l,  SQUAFE(c)  ->  g  (v/,c,d)  =cAd 

where  SQUARE (x)  is  a  predicate  which  is  TRUE  iff  x  is 
square  and  SbF.TJx)  is  the  square  root  of  the  perfect 
This  tern  is  necessary  in  the  domain  requirement  since 


a  perfect 
square  :<• 
the  ur.SIM- 


_  1  — 
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PLIFIED  function  is  only  defined  for  values  of  c  which  are  per¬ 
fect  squares.  Note  that  C2b  is  a  subset  of  Cl  and  hence  is  of  no 
additional  help  in  characterizing  the  general  loop  function.  The 
heuristic  suggested  in  REWRITE  is  to  rewrite  the  function  expres¬ 
sion  wA(d*2+l)  of  C2a  in  terms  of  w,  w*w  (so  as  to  introduce  c) 
and  d.  The  peculiar  nature  of  the  exponent  in  this  expression 
leads  one  to  the  equivalent  formula  w*((w*w)~d).  Applying  SUB¬ 
STITUTE  in  C2a  yields 

d>=0  ->  g (w,c,d) =w* (cAd) . 

This  function  is  in  agreement  with  (i.e.  is  a  superset  of)  Cl  and 
thus  is  a  reasonable  hypothesis  for  the  general  loop  function. 

In  this  example,  the  portion  of  C2  corresponding  to  the  loop 
body  path  which  bypasses  the  updating  of  the  initialized  data  is 
redundant  with  Cl.  Based  on  this,  one  might  conclude  that  such 
loop  body  paths  should  be  ignored  when  constructing  C2.  Consider¬ 
ing  all  loop  body  paths,  however,  does  have  the  advantage  that  an 
incorrect  procram  could  possibly  be  disproved  (at  the  time  the 
general  loop  function  is  being  constructed)  by  observing  an 
inconsistency  between  constraint  functions  Cl  and  C2.  For 
instance,  in  the  example,  if  the  assignmant  to  c  had  been  written 
"c :  =c*2" ,  the  above  analysis  v.’ouli  have  detccte'-'1  an  inconsistency 
in  the  constraints  on  the  general  loop  function.  Such  an  incon¬ 
sistency  implies  that  the  hypothesis  being  sought  for  the  general 
behavior  of  the  loop  does  not  exist,  ar.^  hence,  that  the  program 
is  not  correct  with  respect  to  its  specification. 
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In  the  previous  section,  the  reader  may  recall  that  awkward¬ 
ness  in  disproving  programs  was  offered  as  a  disadvantage  of  a 
"top  down"  approach  to  synthesizing  g.  It  has  been  our  experi¬ 
ence,  however,  that,  as  in  the  above  instance,  an  error  in  the 
program  being  considered  often  manifests  itself  as  an  incon¬ 
sistency  between  Cl  and  C2.  Such  an  inconsistency  is  usually 
"easy"  to  detect  and  hence  the  program  is  "easy"  to  disprove. 
While  it  is  difficult  to  give  a  precise  characterization  of  when 
this  v/ill  occur,  intuitively,  it  will  be  the  case  provided  that 
the  "error"  (e.g.  c*2  for  c*c)  can  be  "executed"  on  the  first 
iteration  of  the  loop. 


Example  2  '  The  following  program  counts  the  number  of  nodes 
in  a  nonempty  binary  tree  using  a  set  variable  s.  It  differs 
from  the  previous  example  in  that  more  than  1  variable  is  ini¬ 
tialized.  The  tree  variable  t  is  the  input  tree  whose  nodes  are 
to  be  counted.  T-?e  use  the  notation  left(t)  and  right  (t)  for  the 
left  and  right  subtrees  of  t  respectively.  The  predicate 
empty(t)  is  THUS  iff  t  is  the  empty  tree  (i.e.  contains  0  nodes). 

t "empty (t); 
n  :  =  0 ;  c  : =  it}; 
h  i  '  e  c  f  { }  do 

select  and  remove  sone  element  e  from  s; 


Trie 

the 


n  ;  =  n  +  1 ; 

if  “empty {  left (e) )  then  s  :  = 
i f  “enpty (right (e) )  then  s  : = 
o? 

{r.**-TCDS3(t)  } 


notation  ”or)'Zn(t)  appearing  in  the 
number  of  nodes  in  binary  tree  t. 


s  U  | left  (e) |  fi; 
s  u  [right  (e) }  fi 


postcondition  stands  for 
The  first  constraint  func- 
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tion  is 

Cl:  ~empty(t),  n=0,  s= { t }  ->  g (n,s) =NODES (t) . 

Rather  than  considering  each  of  the  4  possible  paths  through  the 
loop  body  individually,  we  abstract  the  combined  effect  of  the 
two  IF  statements  as  the  assignment 
s  :*  s  U  SONS  (e)  , 

where  SOMS(x)  is  the  set  of  0,  1  or  2  nonempty  subtrees  of  x. 
Applying  this,  the  second  constraint  function  is 
C2:  "'empty  (t)  ,  n*l,  s=SONS(t)  ->  g (n,s) =N0DES(t) . 

Ne  choose  to  RE’tfRITE  the  function  expression  for  C2  using  the 
recursive  definition  that  NODES (x)  for  a  nonempty  tree  x  is  1 
plus  the  NODES  value  of  each  of  the  0,  1  or  2  nonempty  subtrees 
of  x.  Specifically,  this  would  be 
1+SUM (x , SONS ( t) ,NODES { X ) ) 

where  3UM(A,3,C)  stands  for  the  summation  of  C  over  all  A  6  3. 

Applying  SUBSTITUTE  in  the  obvious  way  yields 
~empty{t)  ->  g (n,s) *n+3UM(x,s, NODES (x) ) 
which  is  in  agreement  with  Cl  and  is  thus  a  reasonable  guess  for 
the  general  loop  function  g. 


Two  remarks  are  in  order  concerning  this  example,  ""he  firs'; 
deals  with  the  condition  "empty (t)  appearing  in  the  domain 
requirement  of  the  obtained  function.  The  reader  nay  wonder,  if 


t  is  not  referenced  in  the  loop 
for  g) ,  how  can  the  loop  behavior 
is  that  it  obviously  cannot; 
equivalent  to 


(it  is  not  in  the  parameter  list 
depend  on  empty (t) ?  The  answer 
the  •“hove  function  is  simp1  y 


I 
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g (n,s) =n+SUM(x,s, NODES (x) ) . 

For  the  remainder  of  the  examples  of  this  section,  we  assume  that 
these  unnecessary  conditions  are  removed  from  the  domain  require¬ 
ment  of  the  constraint  function  as  part  of  the  SUBSTITUTE  step. 

As  a  second  point,  in  Example  3  we  encounter  the  case  where 
the  obtained  function  is,  strictly  speaking,  too  general,  in  that 
its  domain  includes  "unusual"  inputs  for  which  the  behavior  of 
the  loop  does  not  agr  ie  with  the  function.  For  instance,  in  the 
example,  the  loop  com;  --  c^s  the  function 
g (n,s) =n+SUM(x, s , NODES (x) ) 

only  under  the  provision  that  the  set  s  does  not  contain  the 
empty  tree.  This  is  normally  not  a  serious  problem  in  practice. 
One  proceeds  as  before,  i.e.  attempts  to  push  through  a  proof  of 
correctness  using  the  inferred  function.  If  the  proof  is  suc¬ 
cessful,  the  program  has  been  verified;  otherwise,  the  charac¬ 
teristics  of  the  input  '•'lata  which  cause  the  verification 
condition (s)  to  fail  (e.g.  s  contains  an  empty  tree)  suggest  an 
appropriate  restriction  of  the  input  domain  (e.g.  s  contains  only 
nonempty  trees)  and  the  program  can  then  be  verified  using  this 
new,  restricted  function. 

Example  4_  [Sr ies  79]  -  Ackermann's  function  Mm,n)  can  he 
defined  as  follows  for  all  natural  numbers  n  and  ns 

A(C,n)  =  n+1 

A  (m+i ,  C )  ~  A  (m,  1) 

A(n+l,n+l)  =  A (m,  A  (r.+l  ,n)  )  . 

The  following  program  computes  Ackermann's  function  using  a 
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sequence  variable  s  of  natural  numbers.  The  notation  s(l)  is  the 
rightmost  element  of  s  and  s(2)  is  the  second  rightmost,  etc. 
The  sequence  s(..3)  is  s  with  s(2)  and  s(l)  removed.  We  will  use 
<  and  >  to  construct  sequences,  i.e.  a  sequence  s  consisting  of  n 
elements  will  be  written  <s(n),  ...  ,s(2),s(l)>. 

{ m>=0 , n>=0 } 
s  :=  <m,n>; 
while  size(s)  f  1  do 

IT  s(2)  *  0  then  s:=s(..3)  <s(l)+l> 

eTse  if  s(l)-0  then  s:=s{..3)  <s(2)-l,l> 

else  s:=s(..3)  <s  (2) -l,s  (2) ,s  (1) -1>  fi 

o<J 

{ s=<A (m,n) >} 

For  this  program,  the  first  constraint  function  is 


Cl: 

o 

» 

A 

& 

n>=0 , 

s=<m,n>  -> 

g (s) =<A (m,n) > . 

The  second  constraint  functions  corresponding  to  the  3 

through 

the 

loop  body  are 

C2a: 

m=0 , 

n>=0 , 

s=<n+l> 

->  g (s) =<A(m,n) > 

C2b: 

m>0 , 

n  =0 , 

s=<n-l , 1> 

->  g (s) =<A (n,n) > 

C2c: 

n>G , 

n  >0 , 

s=<m-l,m,n- 

1>  ->  g (s) =<A(m,n) >. 

REWRITING  these  3 

based  on  the 

above  definition  of  a  yields 

m=0 , 

n>=0 , 

s=<r.+l> 

->  g(s)=<n+l> 

n>0, 

n  =0, 

s=<n-l , 1> 

->  g (s) =<A (m-l , 1) > 

~i>  0 , 

n  >G , 

3=<m-l,m,n- 

1>  ->  g (s) =<A(n-l,A(m,n-l) ) >. 

3'JBSTIT 

TJTING 

here 

yields 

s=<s  (1)  > 

3=<3(2) ,s (1) > 
C=<S{3) ,3(2) ,S(1)> 


->  g ( s ) — < S  (1)  > 

->  j  (s)  =<Ms  (2)  ,s(l))> 

->  g(s)=<A(s(3)  ,A's(2)  ,s(l)))>. 


Cota  that  the  second  of  these  functions  implies  Cl.  ~hc 
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to  suggest  the  general  loop  behavior  (where  n>l) 
g (<s (n) ,s (n-1) ,  ...  ,s(l)>)  * 

<A(s  (n)  ,  A(s  (n-1)  ,  ...  A(s(2),s(l))  ...  ))>. 


We  remark  that  in  the  first  3  examples,  the  heuristic 
resulted  in  a  loop  function  which  was  sufficiently  general  (i.e. 
the  loop  was  closed  for  the  domain  of  the  inferred  function)  . 
Example  4  illustrates  that  this  does  not  always  occur.  The  loop 
function  heuristic  is  helpful  in  the  example  in  that  it  suggests 
a  behavior  of  the  loop  for  general  sequences  of  length  1,  2  and 
3.  Based  on  these  results,  verifier  is  left  to  infer  a  behavior 
for  a  sequence  of  arbitrary  length. 

Example  5^  -  Let  v  be  a  one  dimensional  array  of  length  n>0 
which  contains  natural  numbers.  The  following  program  finds  the 
maximum  element  in  the  array: 

m  :  =  0  ;  i  :  *  1 ; 

wh i le  i  <=  n  do 

if  n  <  v[iT~ then  n  :=  v[i]  f i; 
i  :  *  i  +  1 
f  i 

{m=3IGGE3T (v) } 


The  notation  3lGG2ST(v)  appearing  in  the  postcondition  stands  for 
the  largest  element  of  v.  The  following  constraint  functions  are 


obtained 
Cl:  "1*0, 

C2:  n=v[l], 

Untieing  the 
2 IGGC3T (v)  in 


i=i  ->  g (n, i,v,n) =3IGGEST(v) 
i=2  ->  g  (m , i , v, n)  =?.IGGT73c (v)  . 
appearance  of  v  [  L  ]  an-’  2  in 
C2  as  'M!v[l]  ,?IGGT3?(v[2..r.]  )  )  , 


-  r 


*  to 


where  riAX  returns 
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the  largest  of  its  two  arguments#  and  v[2..n]  is  a  notation  for 
the  subarray  of  v  within  the  indicated  bounds.  The  generaliza¬ 
tion  which  suggests  itself# 

g (m, i #v,n) =MAX (m# BIGGEST (v [ i . .n] ) ) # 
agrees  with  Cl. 

Example  £  -  If  p  is  a  pointer  to  a  node  in  a  binary  tree# 
let  POST(p)  be  the  sequence  of  pointers  which  point  to  the  nodes 
in  a  postorder  traversal  of  the  binary  tree  pointed  to  by  p.  The 
following  program  constructs  ?OST(p)  in  a  sequence  variable  vs 
using  a  stack  variable  stk.  We  use  the  notation  l(p)  and  r (?) 
for  the  pointers  to  the  left  and  right  subtrees  of  the  tree 
pointed  to  by  p.  If  p  has  the  value  NIL#  ?OST(p)  is  the  empty 
sequence.  The  variable  rt  points  to  the  root  of  the  input  tree 
to  be  traversed. 

p  :=  rt;  Stk  :=  EMPTY;  vs  :=  <>; 

while  ~ (p=NIL  &  stk*EMPTY)  do 
if  p^NIL  then 

stk  <=  p  /*  push  p  onto  stk  */  : 

?  :*  1(?) 
else 

p  <=  stk  /*  pop  stk  */  ; 

vs  :=  vs  ! |  <?>; 

o  :  =  r  (p)  f  i 
c£ 

(vs  *  POST (r t) } . 

Up  until  new,  we  have  attempted  to  infer  a  general  loop  function 
from  two  constraint  functions.  Of  course,  there  is  nothing  spe¬ 
cial  about  the  number  two.  In  this  example,  the  "connection" 
between  the  initialized  variables  and  the  function  values  is  not 
clear  from  the  first  two  constraint  functions  and  it  proves  helo- 
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ful  to  obtain  a  third  constraint  function.  Functions  Cl  and  C2 
correspond  to  0  and  1  loop  body  executions,  respectively.  The 
third  constraint  function  C3  will  correspond  to  2  loop  body  exe¬ 
cutions.  We  will  use  the  notation  (el,  ...  ,en)  for  a  stack  con¬ 
taining  the  elements  el,  ...  ,en  from  top  to  bottom.  The  con¬ 
straint  functions  for  this  program  are 

Cl:  p=rt,  stk=EIPTY,  vs=<>  -> 

g (p,stk,vs) =POST(r t) 

C2:  rt^NIL,  p=l(rt),  stk=(rt),  vs=<>  -> 

g (p,stk,vs) =POST(rt) 

C3a:  rt^NIL,  l(rt)^NIL,  p=l(l(rt)),  stk=  (1  (r t)  , rt)  ,  vs  =  <>  -> 

g (p,stk,vs) =POST(rt) 

C3b:  rt^MIL,  l(rt)»NIL,  ?=r(rt),  stk=EM?TY,  vs=<rt>  -> 

g (p,stk,vs) =POST(rt) . 

Note  that  there  are  two  third  constraint  functions.  C3a  and  C3b 
correspond  to  executions  of  the  first  and  second  loop  body  paths 
(on  the  second  iteration) ,  respectively.  There  is  only  1  second 
constraint  function  since  only  the  first  loop  body  path  can  be 
executed  on  the  first  iteration.  Using  the  recursive  definition 
of  POST,  we  P.fCTT.ITE  C2,  C3a  and  C3b  as  follows: 


22" :  rt/UIL, 


?=l(rt),  stk= (r t)  , 


73*<>  -•> 


g  (p ,  s  t'c ,  vs )  =P02T  ( 1  ( r  t)  )  ,!!<rt>!!  PC5T(r(rt)) 

3a':  rt^IIL,  l(rt)*NIL,  p-Ml(rt)),  stk«(l  (rt)  ,rt)  ,  vs=<>  -> 

3 (?,stk ,vs) -P0ST(1 (1 (r t) ) )  | | <1 (r t) > | |  PT3- ( r ( 1 ( r t 5 ) ) 

!!<rt>!|  ?.-.f-(s  (rt)  J 


:3b':  rt*‘!IL,  l(rt)-*7IL,  ?  =  r(rt),  3tk-riP7Y, 
3  ,vs)  =<rt>  |!  ?OS?(r  (rt)  )  . 


v s  =  < r t >  -> 
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Applying  SUBSTITUTE  to  each  of  C2",  C3a'  and  C3b"  suggests 
stk=(el),  vs=<>  ->  g (p, stk ,vs) =POST (p)  ||<el>||  POST(r(el)) 

stk=(el,e2),  vs=<>  ->  g (p, stk ,vs) =POST(p)  ||<el>||  POST(r(el)) 

|  |  <e2>  |  |  POST(r  (e2)  ) 

stk=EM?TY  ->  g (p,stk ,vs) =vs  ||  POST(p) 

respectively.  The  first  2  of  these  functions  imply  the  following 
behavior  for  an  arbitrary  stack  where  vs  has  the  value  <>: 
stk=(el,  . ..,  en) ,  vs=<>  ->  g(p,stk,vs)  = 

POST (p)  ||  (<el> | |  POST (el)  ||  ...  ||<en>||  POST(en)) 
and  in  combination  with  the  last  function,  the  general  behavior 
stk=(ei,  ...,  en)  ->  g(p,stk,vs)  = 

vs  j|  POST (p)  ||  (<el> | |  POST (el)  I!  ...  | | <en> | |  POS^fen)) 
is  suggested. 


In  this  section  we  have  illustrated  the  use  of  our  technique 
on  a  number  of  example  programs.  The  reader  has  seen  that  the 
success  of  the  method  hinges  largely  on  the  way  P.E''T?.ITT  is  per¬ 
formed.  That  guidelines  can  be  used  in  deciding  how  to  apply 
this  step?  rnhe  general  rule  given  above  is  to  identify  the  vari¬ 
ables  that  need  to  be  introduced  into  the  expression  and  then  to 
rewrite  the  expression  using  the  terms  to  which  these  variables 
are  bound.  Tor  instance  in  Fxa.rn.pl e  3,  *’C?TS(t)  was  rewritten 
using  the  terms  1  and  SCT?(t).  3eyond  this  '•u’e,  however,  the 
reader  may  have  noticed  an  additional  similarity  in  the  way 

function  or 

the  initialised  loop  program  is  intended  to  compute , 


-,TT'hITT  was  applied  in  these  examples.  If  f  is  the 


each  T.TTTTTF  step  involved  decomoos ing 


T 
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some  way.  In  Example  1,  for  instance,  a  multiplication  operation 
was  decomposed  into  an  addition  and  multiplication  operation;  in 
Example  3,  a  NODES  operation  was  decomposed  into  a  summation  and 
a  number  of  NODES  operations;  in  Example  5,  a  BIGGEST  operation 
was  decomposed  into  a  MAX  and  a  BIGGEST  operation.  In  Section  6 
we  will  characterize  this  idea  of  decomposing  the  intended  opera¬ 
tion  of  the  initialized  loop  program  and  discuss  several  implica¬ 
tions  of  the  characterization  for  the  proposed  technique. 

In  Example  6,  we  saw  that  the  technique  generalizes  to  the 
use  of  3  (and  indeed  an  arbitrary  number  of)  constraint  func¬ 
tions.  We  have  seen  that  each  of  these  functions  defines  a  sub¬ 
set  of  the  general  loop  function  g  being  sought.  If  the  con¬ 
straint  functions  themselves  are  sufficiently  general,  it  may  be 
that  the  first  several  of  these  functions,  taken  collectively, 
constitute  a  complete  description  of  g.  'Je  consider  this  situa¬ 
tion  in  the  following  section. 


_4 .  Complete  Constraints 

The  technique  described  above 
function  is  "nondetermir.  Lstic"  in 
do  not  precisely  identify  the  derir 
as  a  formal  basis  from  which  Intel 1 
earning  the  general  behavior  of  the 
is  o^ten  easy  for  a  human  being  to 
of  the  Vacp  function  "picture"  or.ee 


for  obtaining  a  general  loop 
that  the  constraint  functions 
ed  eunct ion ;  rather  they  serve 
leant  guesses  can  he  made  con- 
’oop.  Our  belief  is  that  it 
fill  in  the  remaining  "pieces" 
this  basic  has  been  as  t~  — 
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There  exist/  however/  circumstances  when  the  constraints  do 
constitute  a  complete  description  of  an  adequate  loop  function. 
Specifically,  this  description  may  be  complete  through  the  use  of 
I,  2  or  more  of  the  constraint  functions.  The  significance  of 
these  situations  is  that  no  guessing  or  "filling  in  the  picture" 
is  necessary;  the  program  can  be  proven/disproven  correct  using 
the  constraints  as  the  general  loop  function.  In  this  section  we 
give  a  formal  characterization  of  this  circumstance. 


Definition  -  For  some  U  >  0,  an  initialized  loop  is  M-closed 
with  respect  to  its  specification  f  iff  the  union  of  the  con¬ 
straint  functions  Cl,C2,  ...  ,C'3  is  a  function  g  such  that  the 
loop  is  closed  for  the  domain  of  g .  In  this  case,  the  con¬ 
straints  Cl,C2,  ...  Ctt  are  complete . 


i 

I 

l 


Thus  if  a  loop  is  TC-closed  for  some  N>0,  the  union  of  the 
first  U  constraint  functions  constitutes  an  adequate  loop  func¬ 
tion  for  the  loop  under  consideration.  Intuitively,  the  value  1] 
is  a  measure  of  how  quickly  (in  terms  of  the  number  of  loop 
iterations)  the  variables  constrained  by  initialization  take  on 
"general"  values. 


ixangl e  7  ~  The  fo?.  lowing  program 


>=0} 

3  Z  —  3  +  1 ; 

h i\e  b  >  0  co 
a  :  =  a  +~T ; 
b  :*  b  -  1 
r,C 

[.3  =  3T“  f  bC  +  1} 


I 


Jl  *■ 
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is  1-closed  since  the  first  constraint  function  is 
Cl:  b0>=0,  a=aO+l,  b=bO  ->  g (a ,b) =aQ+bQ+l 

which  SIMPLIFIES  to 

b>=0  ->  g(a,b)=a+b 

and  the  loop  is  closed  for  the  domain  of  this  function.  Thus  Cl 
by  itself  defines  an  adequate  loop  function. 

Initialized  loops  which  are  1-closed  seem  to  occur  rarely  in 
practice.  Somewhat  more  frequently,  an  initialized  loop  will  be 
2-closed.  For  these  programs,  the  loop  function  synthesis  tech¬ 
nique  described  above  (using  2  constraint  functions)  is  deter¬ 
ministic  . 


Example  8a  -  Consider  the  program 


sum  :=  0; 

while  sea  ?  EMPTY  do 

sum  :=  sum  +  head(seq); 
seq  :*  tail(seq) 
od 

( sum=SIG'!A  ( seqO)  }  . 


The  notation  SITlA(seqC)  appearing  in  the  postcondition  stands 
for  the  sum  of  the  elements  in  the  sequence  secO.  "he  program  is 
2— closed  since  the  second  constraint  function  is 
Z2:  secC^EMPTY,  sum«head(secO) ,  seq=tail (seqO)  -> 

q  (sum , sec)  =<5IG'^ (seqO) 


which  dr'PUFlE?  to 

g (sum, seq) “cum+STCYA (sec) . 

"he  loop  is  trivially  closed  for  the  Coma i n  of  this  function. 
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Example  8b  -  As  a  second  illustration  of  a  2-closed  initial¬ 
ized  loop,  the  following  program  tests  whether  a  particular  key 
appears  in  an  ordered  binary  tree. 


success  i-  FALSE; 

while  tree  f  NULL  &  “success  do 

Tf  name (tree)  *  key  then  success  : =  TRUE’ 
elseif  name (tree)  <  key  then  tree  :*  right (tree) 
else  tree  :=  left(tree)  fi 

o? 

{success  =  IN(key, treeO) } 


The  notation  IN (key , treeO)  is  a  predicate  which  is  true  iff  key 
occurs  in  ordered  binary  tree  treeO.  This  program  is  also  2- 
closed.  Note  that  the  first  constraint  function 
Cl:  success=FALSE,  tree=treeO  -> 

g (success, tree, key) =IN(key, treeO) 

SIMPLIFIES  to 

success=FALSE  ->  g (success , tree , key) *IN ( key , tree) . 

If  we  consider  the  first  path  through  the  loop  body,  the  second 
constraint  function  is 

22:  3uccess=TF.US,  treeO/'JXL,  tree=trceC,  key *nane (tree)  -> 

g  (success , tree , key)  =1'! (key ,  treeO) 

which  SIMPLIFIES  to 

success=TR:JE,  tree^MTL,  key sname (tree)  -> 

g (success , tree, key) =IN (key , tree)  . 

Although  the  domain  of  the  union  of  these  two  functions  is  some¬ 
what  restricted,  i.e. 

{ <success ,tree,’;ey>  | 

(  (“success)  22  ( tr ee^MTL  &  key=r.ame  (tr  ee)  ) ) } , 
the  loop  is  nevertheless  close'-'’  for  this  domain  nr  J  '-.once  the 
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initialized  loop  is  2-closed. 


Example  3c  -  Consider  the  sequence  of  initialized  loops 
P1»P2,P3  ...  defined  as  follows  for  each  I>0j 


PI  :  {x>=0} 

X  :■  X  *  Ij 
while  x  >  0  do 
x  :=  x  -  17 
y  y  +  k 
od 

{y=yff  +  xO*i*k}. 


For  any  I>0,  the  first  I  constraint  functions  for  program  PI  are 
Cl:  x0>=0 ,  x=xO*I ,  y-yO  ->  g  (z,y,k)=yO+xO*I*k 

C2:  xO>=l,  x=xO*l-l ,  y=yO+k  ->  g  (z,y,k)  =>yO+xO*I*k 


Cl:  xO>=I-l,  x=xO*I- ( 1-1)  ,  y=yO+k*(I-l)  ->  g  (x,v ,  k )  =yO+:rO*I*k  . 
These  SIMPLIFY  to 

x>-0,  MI (x)  ->  g (x,y ,k) =y+x*k 

:<>  =  0,  MI  (x+1)  ->  g  (x,y , !<)  =y+x*k 


x>=0,  ::i{jc+(i-1))  ->  g(z,y,k)=y+x*k 
■•/here  MI  is  a  predicate  which  is  TT.US  ifc  its  argument  is  a  mul¬ 
tiple  of  I.  Since  the  union  of  these  is  the  function 
::>=0  ->  g  (x,y,k)=y+x*k, 

arM  the  loop  is  closed  for  the  Toma  in  of  this  function,  con- 
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elude  that  for  each  I>0,  program  PI  is  I-closed. 

For  many  initialized  loops  which  seem  to  occur  in  practice, 
however,  there  does  not  exist  an  N  such  that  they  are  'J-closed 
with  respect  to  their  specifications.  This  means  that  no  finite 
number  of  constraint  functions  will  pinpoint  the  appropriate  gen¬ 
eralization  exactly;  i.e.  when  applying  the  above  technique  in 
these  situations,  some  amount  of  inferring  or  guessing  will 
always  be  necessary.  A  case  in  point  is  the  integer  multiplica¬ 
tion  program  from  Sxample  1.  The  constraint  functions  C1,C2,C3, 
...  define  the  general  loop  behavior  for  z=0,  z=k,  z=2*k,  ... 
etc.  The  program  cannot  be  N-closed  for  any  N  since  with  input 
v=M+l,  the  last  value  of  z  will  be  (N+l) *k  which  is  not  in  the 
domain  of  any  of  these  constraint  functions. 


As  a  final  comment  concerning  tT-closed  initialized  loops,  it 
may  be  instructive  to  consider  the  following  intuitive  view  of 
these  programs.  All  1-closed  and  2-closed  initialized  loops 


share  the  characteristic  that  they  ace  "forgetful",  i.e.  they 
soon  lose  track  of  ho*-'  "long"  they  have  been  executing  and  lack 
the  necessary  data  to  recover  this  information.  This  is  due  to 
the  fact  that  intermediate  data  states  "hich  occur  after  an  arbi¬ 
trary  number  oc  iterations  are  ind istlnguishable  from  data  states 
which  occur  after  0  (or  1)  loop  iterations.  To  illustrate,  con¬ 
sider  the  2-closed  initialized  loop  cC  Txample  3a  which  sums  the 
elements  contained  in  a  sequence.  After  some  arbitrary  number  of 
iterations  in  an  execution  of  this  program,  suppose  we  stop  it 
and  inspect  the  values  of  the  program  variables  sum  and  seq. 
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Based  on  these  values,  what  can  we  tell  about  the  history  of  the 
execution?  The  answer  is  not  too  much;  about  all  we  can  say  is 
that  if  sum  is  not  zero  then  we  know  we  have  previously  executed 
at  least  1  loop  iteration,  but  the  exact  number  of  these  itera¬ 
tions  may  be  1,  10  or  10000. 

By  way  of  contrast,  again  consider  the  integer  multiplica¬ 
tion  program  of  Example  1,  an  initialized  loop  we  know  not  to  be 
N-closed  for  any  N.  Suppose  we  stop  the  program  after  an  arbi¬ 
trary  number  of  iterations  in  its  execution.  Based  on  the  values 
of  the  program  variables  z,  v  and  k,  what  can  we  tell  about  the 
history  of  the  execution?  This  information  tells  us  a  great 
deal;  for  example,  we  know  the  loop  has  iterated  exactly  z/k 
tines  and  we  can  reconstruct  each  previous  value  of  the  variable 
z . 


Initialised  loops  which  have  the  information  available  to 
reconstruct  their  past  have  the  potential  to  behave  in  a  "tricky" 
manner.  3y  "tricky"  here,  we  mean  performing  in  such  a  way  that 
depends  unexpectedly  on  the  history  of  the  execution  of  the  loop 
(i.e.  on  the  effect  achieved  by  previous  Icon  iterations).  The 
result  of  this  loop  behavior  would  be  a  loop  function  which  was 
"inconsistent"  across  all  values  of  the  loop  inputs  and  ••hie’’ 
could  only  be  inferred  from  the  constraint  functions  with  consid¬ 
erable  ■’  i  e  f  icu  1  ty .  Te  consider  this  phenomenon  more  carefully  in 
the  following  section;  for  now  we  emphasize  that  it  is  precisely 
the  potential  to  behave  in  this  unpleasant  manner  that  •' s  lacking 
in  1-close'-1  -and  2-closed  initialized  looms  3r"  whic’-  n’lo’-s  their 
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general  behavior  to  be  described  completely  by  the  first  1  or  2 
constraint  functions. 

5.  "Tricky^  Programs 

The  above  heuristic  suggests  inferring  g  from  2  subsets  of 
that  function.  Cl  and  C2.  Constraint  function  C2  is  of  particu¬ 
lar  importance  since  REWRITE  and  SUBSTITUTE  are  applied  to  this 
function  and  it,  consequently,  serves  to  guide  the  generalization 
process.  C2  is  based  on  the  program  specification  f,  the  ini¬ 
tialization  and.  the  input/output  behavior  of  the  loop  body  on  its 
first  execution.  In  any  problem  of  inferring  data  concerning 
some  population  based  on  samples  from  that  population,  the  accu¬ 
racy  of  the  results  depends  largely  on  how  representative  the 
samples  are  of  the  population  as  a  whole.  The  degree  to  which 
the  sample  defined  in  C2  is  is  representative  of  the  unknown  gen¬ 
eral  function  we  are  seeking  depends  entirely  on  how  representa¬ 
tive  the  input/output  behavior  of  the  loop  body  on  the  first  loop 
iteration  is  of  the  input/output  behavior  of  the  .loop  body  on  an 
arbitrary  subsequent  loop  iteration. 

To  give  the  reader  the  general  idea  of  what  we  have  in  mind, 
consider  the  program  tc  count  the  nodes  in  a  binary  tree  in  Exam¬ 
ple  3.  If  the  loop  body  did  something  peculiar  when,  for  exam- 
pin,  the  set  s  contained  2  nodes  with  the  sane  parent  node,  or 
when  n  had  the  value  15,  the  behavior  of  the  loop  body  cn  its 
first  execution  would  not  be  representative  cf  its  general 
behavior,  hy  "peculiar"  here,  we  mean  something  that  would  not 
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have  been  anticipated  based  solely  on  input/output  observations 
cf  its  initial  execution.  An  application  of  our  heuristic  on 
programs  of  this  nature  would  almost  certainly  fail  since 
(apparently)  vital  information  would  be  missing  from  Cl  and  C2. 


Example  £  -  Consider  applying  the  technique  to  the  following 
program  v.’hich  is  an  alternative  implementation  of  the  integer 
multiplication  program  presented  in  Example  1: 


{v>=0 } 
z  :=  0 ; 

while  v  /  0  do 

if  z=0  then  z  :  =  k 

elseif  z=k  then  z  :=  z  *  2  *  v 
else  z  :=  z  -  k  f i ; 

v  :  =  v  -  1 
od 

{ z=vTT*k } . 


The  constraint  functions  Cl  and  C2  are  identical  to  those  for  the 
program  in  Example  1  and  we  have  no  reason  to  infer  a  different 
function  g.  Yet  this  function  is  not  only  an  incorrect 
hypothesis,  it  does  not  even  come  close  to  describing  the  general 
behavior  of  the  loop.  The  difficulty  is  that  the  behavior  of  the 
loop  body  on  its  first  execution  is  in  no  way  typical  of  its  gen¬ 
eral  behavior.  This  is  due  to  the  high  dependence  the  loop 
body  behavior  on  the  input  value  of  the  initialize?  variable  z. 


’la  make  the  following  remarks  concerning  program.r  of  this 
nature.  First,  our  experience  indicates  that  they  occur  very 
rarely  ’ n  practice.  Secondly,  because  they  tend  to  be  quite  dif¬ 
ficult  to  analyze  and  understand,  we  consider  them  ’’tricky"  or 
poorly  structured  programs.  Thirdly,  the  question  of  v;wether  tie 
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(input/output)  behavior  of  the  loop  body  on  the  first  iteration 
is  representative  of  its  behavior  on  an  arbitrary  subsequent 
iteration  is  really  a  question  of  whether  its  behavior  when  the 
initialized  variables  have  their  initial  values  is  representative 
of  its  behavior  when  the  initialized  variables  have  "arbitrary" 
values.  Put  still  another  way,  the  question  is  whether  the  loop 
body  behaves  in  a  "uniform"  manner  across  the  spectrum  of  possi¬ 
ble  values  of  the  initialized  data. 


In  practice,  a  consequence  of  a  loop  body  exhibiting  this 
uniform  behavior  is  that  there  exists  a  simply  expressed  connec¬ 
tion  between  different  input  values  of  the  initialized  data  and 
the  corresponding  result  produced  by  the  V7HILF  loo?.  It  is  the 
existence  of  such  a  connection  which  motivates  the  SUBSTITUTE 
step  above  and  which  is  thus  a  necessary  precondition  for  a  suc¬ 
cessful  application  of  the  technique.  This  explains  its  failure 
in  dealing  with  programs  such  as  that  in  Example  3.  'Je  make  no 
further  mention  of  these  "tricky"  programs,  and  in  the  following 
section  discuss  an  informal  categorization  of  "reasonable"  pro¬ 
grams  and  consider  its  implications  for  our  loop  functicr  syn¬ 
thesis  technique. 

•' .  zv  and  ?r>  Looms 


In  this  section,  ve  discuss  genera 
commonly  occurring  iterative  programs, 
used  to  suggest  two  categories  of  these 
z  *.  1 1  o r.  i  i  of  f  o  t  ?  r  ? s  t  z  i  r. c 3  t 3 


characteristics  o H 
r7] i c !**. r etc t r  i s  » i c 
i r o  j ^ in • .  7h *  ~  c  o t '  * 
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loop  functions  is  particularly  useful  when  applied  to  initialized 
loops  in  one  of  these  categories. 

In  solving  any  particular  problem,  it  often  makes  sense  to 
consider  certain  instances  of  the  problem  as  being  "easier"  or 
"harder"  to  solve  than  other  instances.  For  example,  with  the 
problem  of  sorting  a  table,  an  instance  of  the  problem  for  a 
table  containing  !J  elenents  might  be  harder  to  solve  than  an 
instance  of  the  problem  for  a  table  containing  N-l  elements. 
Similarly,  if  the  problem  is  multiplying  natural  numbers,  a*b 
might  be  easier  to  solve  than  (a+l)*b.  This  notion  of  "easier” 
and  "harder"  instances  of  a  problem  is  particularly  apparent  for 
problems  with  natural  recursive  solutions.  These  solutions  solve 
complex  instances  in  terms  of  less  complex  instances  and  hence 
support  the  idea  of  one  problem  instance  being  easier  to  solve 
than  another . 

For  the  purpose  of  this  discussion,  we  divide  the  data  modi¬ 
fied  by  the  initialized  loop  under  consideration  into  two  sec¬ 
tions:  the  accumulating  data  and  the  control  data.  The  accumu¬ 
lating  data  is  the  specified  output  variable (s)  of  the  loop.  The 
remaining  modified  data  is  the  control  'lata  and  often  serves  to 
"guide"  the  execution  of  the  loop  and  determine  the  point  at 
which  the  loop  should  terminate.  Eoth  the  accumulating  data  and 
the  control  data  are  typically  (but  not  always)  constrained  by 
initialization  in  front  of  the  loco. 
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Example  10a  -  In  the  program 


{n>=0} 

z  :=  1;  t  :=  0; 

while  t  /  n  do 

t  :=  t  +  IT 

z  z  *  t 


the  variable  z  is  the  specified  output  of  the  loop  and  is  hence 
the  accumulating  data.  The  other  modified  variable,  t,  is  used 
to  control  the  termination  of  the  loop  and  is  the  control  data. 

In  many  cases,  the  control  data  can  be  viewed  as  ’•epresent- 
ing  an  instance  (or  perhaps  several  instances)  of  the  problem 
being  solved.  As  the  loop  executes  and  the  control  data  changes, 
the  control  data  represents  different  instances  of  this  problem. 
To  illustrate,  we  can  think  of  the  control  data  t  in  the  previous 
example  as  a  variable  describing  a  particular  instance  of  the 
factorial  problem.  As  the  loop  executes,  the  variable  t  takes  on 
the  values  G,  1,  ...,  n,  and  these  values  can  be  thought  to 
correspond,  to  the  problems  0 ! ,  1',  ...  n!. 


3ased  or.  these  informal  observations,  we  characterize  a  *3U 
(from  the  Eottom  Upward)  loop  as  one  where  the  control  data  prob¬ 
lem  instances  are  generated  in  order  ef  increasing  comp’ ex.i ty , 
beginning  with  a  simple  instance  and  ending  with  the  input  prob¬ 
lem  instance  to  be  solved.  In  the  execution  of  a  ?,U  loop,  the 
control  data  can  be  viewed  as  representing  the  "mo -ok "  that  has 
b 3 e p.  Tccefjplic h e d  "30  far."  consider  the  factorial  program 


above  to  be  a  PU  loco.  it  any  point  in  t i ->e ,  the  "•or’;"  to.  h." 
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accomplished  is  t!  and  t  moves  from  0  (a  simple  factorial 
instance)  to  n  (the  input  factorial  instance) . 

Conversely,  we  characterize  a  TD  (from  the  Top  Downward) 
loop  as  one  where  the  control  data  problem  instances  are  gen¬ 
erated  in  order  of  decreasing  complexity,  beginning  with  the 
input  problem  instance  and  ending  with  a  simple  problem  instance. 
In  the  execution  of  a  TD  loop,  the  control  data  can  be  viewed  as 
representing  the  "work"  that  remains  to  be  done . 

Example  10b  -  !fe  consider  the  following  alternative  imple¬ 
mentation  of  factorial  to  be  a  TD  loop: 


A  Heuristic  For  Deriving  Loop  Functions 


A:  [y>=0} 

w : =1 ;  t : =0 ; 
while  t^y  do 
w  :=  w*x; 
t  :=  t+1 
od 

{w-x  y} 


3:  {y>-0} 

w:=l;  t:=y; 
while  t^O  do 
w  : =  w*x; 
t  : —  t-1 
od 

{w-x  y} 


C:  { y >=0 } 

w:=lj  c:=x;  t:=y; 
while  t/0  do 

TT  odd (tT  then 
w  :=  w*c  fi; 
c:=c*c;  t:*t7^ 
od 

{w-x  y} 


As  before,  the  symbol  ~  is  used  as  an  infix  exponentiation  opera¬ 
tor.  !7e  consider  program  A  to  be  a  EU  loop.  The  control  data  t 
moves  from  0  to  y  and  corresponds  to  the  problem  instances  x~0, 
...,  xAy.  On  the  other  hand,  E  is  TD  since  the  control  data  t 
moves  from,  y  to  0  and  corresponds  to  the  problem  instances  x^y, 
...,  x^O.  Program  C  (similar  to  that  in  Example  2)  is  slightly 
more  difficult  to  analyze.  The  control  data  is  the  pair  <c,t>. 
The  pair  is  initialized  to  <x,y>  and  ends  with  the  value  <c',0>, 
where  c'  is  some  complex  function  of  x  and  y.  It  seems  reason¬ 
able  to  consider  <c,t>  as  representing  the  problem  c~t.  Hence  we 
conclude  C  is  also  TD.  This  conclusion  also  mahes  sense  in  "light 
of  the  fact  that  C  is  really  an  optimized  version  of  E  which 
saves  iterations  by  exploiting  the  binary  decomposition  of  y. 


The  characterization  of  2H  and  TD  loops  described  here  is, 
cf  course,  an  informal  one  and  depends  largely  on  one's  ir.terpre- 


tation  of  the  meaning  or  purpose  of 


ov  ~  n  ( 


fie  I  tbe  above  programs  by  usirg  vhat  v*o  corn  i  Sere '7  to  be  tbn 
11  r.  5 1  v.  **  3 1  M  or  in  tu  ^  ^  ivQ  int<ar^r°tatir>nj  ^  t*  ^  r  o  ^  t  ^  t  i  o  ^  s 

are  clvayc  possible.  Occasionally/  tv?o  'i  f  ferer.t  ioterpr  etntiens 
of  the  control  1  a t a  seen  equally  ve?  i  1  an  "  sence  gregr"^.  f 

*•  r>  *  '*  g  o  f '  or*  T*'  ^  ^  [m"  n  e 
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view.  For  example,  consider  the  following  program  which  adds  up 
the  elements  in  a  subarray  between  indices  pi  and  p2: 


sum  :=  0;  i  :=  pi; 
while  t  <=  p2  do 

sum  : =  sum  +  a [ i ] ; 
i  :=  i  +  1 
od 

{sum=ASUM(a [pi . .p2] ) } 


The  notation  ASUM (a [pi . .p2] )  appearing  in  the  postcondition 
stands  for  the  summation  of  the  elements  in  the  indicated  subar¬ 
ray.  The  question  which  arises  in  attempting  to  classify  this 
program  is  as  follows:  as  the  control  data  i  moves  through  the 
values  pi,  pl+1,  ...,  p2,  is  it  most  appropriate  to  think  of  it 
as  representing  the  problem  instance  which  has  been  solved  (i.e. 
ASUM (a [pi . . i ] ) )  or  as  representing  the  problem  instance  which 
remains  to  be  solved  (i.e.  ASUM (a [i . .p2] ) ) .  Doth  views  seem 
equally  intuitive,  that  is,  the  program  seems  to  be  as  much  2U  as 


As  a  final  example,  we  refer  back  to  the  program  in  Ux ample 
3  which  counts  the  nodes  in  a  binary  tree.  It  is  clear  n  an''  the 
set  variable  s  are  the  accumulating  and  control  data  respec¬ 


tively.  Initially, 


contains  the  tree  whose  redes 


to  be 


counted;  when  the  program  terminates  s  is  empty.  In  between,  s 
contains  various  subtrees  cf  the  original  tree.  It  seems  natural 
to  vi°w  the  set  as  containing  progressively  simpler  and  simpler 
Instances  of  the  *Tr5f>U'>  problem,  nir.ee  the  trees  ir.  s  con-iet  of 

c  3 ■  / •  j r  c 3*  * c t  t.*  ’3  ' v i o cu ^  33  •  7*.*  jct  *■  - rjs  ^ r m  i 

?.r,  a  7?  "ocp. 
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V7e  have  seen  that  the  problem  solving  method  taken  by  a  BU 
loo?  is  one  of  approaching  the  general  problem  instance  from  some 
simple  problem  instance.  Of  course,  this  problem  solving  method 
is  reasonable  only  when  there  exists  some  technique  whereby  one 
is  guaranteed  to  "run  into"  the  general  problem  instance.  Our 
view  is  that  in  many  cases,  such  a  convergence  technique  either 
does  not  exist  or  requires  so  much  support  that  the  3U  approach 
is  not  practical.  This  appears  to  be  particularly  true  for  pro¬ 
grams  dealing  with  sophisticated  data  types  (i.e.  something  other 
than  integers)  and  for  programs  requiring  a  high  degree  of  effi¬ 
ciency  in  their  number  of  iterations. 

To  help  see  this  point  of  view,  again  consider  the  'IODES 
program  of  Example  3.  Previously  v/e  argued  that  this  was  a  TD 
program.  What  would  a  3U  program  which  computed  the  same  func¬ 
tion  look  like?  The  following  program  skeleton  suggests  itself: 

n  :=  C;  tl  :=  "an  empty  tree"; 
while  ti  /  t  do 

"add  a  node  to  tl  to  make  it  look  more  like  t” ; 
n  :  =  n  +  1 


"ere,  the  tree  varisVe  tl  is  the  control  date  and  it  represents 
the  problem  :?02E*»(tl)  .  The  difficulty  with  this  attempt  at  a 
program  solution  is  the  implementation  of  the  modification  of  tl. 
Such  a  modification  requires  close  inspection  {i.e.  a  traversal ) 
of  t  In  order  to  move  tl  toward  t.  In  lieht  of  thin,  it  seems 
more  .reasonable  to  count  the  nod rs  o'  t  whi’o  it  is  bo i r' g 
Lnspocte 1  an 1  to  dispense  altogether  with  the  variable  tl. 
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As  an  illustration  of  another  circumstance  where  the  BU 
approach  seems  unreasonable,  the  reader  is  encouraged  to  imagine 
a  3U  implementation  of  integer  exponentiation  which  operates  as 
efficiently  as  the  exponentiation  program  C  from  Example  11. 
Again,  a  program  skeleton  suggests  itself: 


ly>=o} 

w  : =  1 ;  c  :  =  ? ;  d:=0; 
while  <c,d>  ^  <x,y>  do 
c  :=  sqrt (c) ; 
if  ?  then 

d  :=d  *  2  +  1;  w  :=w  *c 
else  d  :=  d  *  2  fi 
od 

{ w=iPry } . 


Here,  we  are  attempting  to  move  the  control  data  <c,d>  toward 
<x,y>  as  fast  as  we  moved  it  away  from  <x,y>  in  TD  program  C.  As 
with  the  3U  NODES  program,  the  problem  here  is  how  to  complete 
the  program  so  as  to  achieve  the  desired  effect.  Our  conclusion 
concerning  this  program  is  that  supplying  an  appropriate  initial 
value  for  c  and  determining  the  proper  loop  body  path  to  be  exe¬ 
cuted  requires  such  complexity  that  this  approach  is  not  a  feasi¬ 
ble  alternative  to  program  C. 


In  this  section  we  have  suggested  two  informal  categories  of 
initialized  loop  programs.  "e  offered  the  opinion  that  the 
approach  taken  in  a  3U  program  solution  has  rather  limited  appli¬ 
cability  and  that  TD  programs  tend  to  occur  more  frequently  in 
practice.  '*e  feel  that  this  character :* zat ion  is  useful  as  a 


study  of  opposing  problem  solving  philosophies  but  our  main 


source  of  .motivation  is  to  investigate  the  kinds  of  cor-.mor.ly 
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occurring  programs  on  which  the  loop  function  synthesis  technique 
described  above  works  well. 


Consider  applying  this  technique  to  a  general  TD  program. 
In  the  second  constraint  function,  the  control  data  is  bound  to  a 
value  which  represents  a  slightly  less  complex  instance  of  the 
general  problem  being  solved  by  the  initialized  loop.  In  prac¬ 
tice,  the  appearance  of  this  value  in  the  constraint  function 
suggests  the  problem  decomposition  being  exploited  by  the  pro¬ 
grammer  in  order  to  achieve  the  program  result.  Applying  this 
decomposition  in  REWRITE  leads  quite  naturally  to  the  desired 
general  loop  function. 


Example  12  -  Consider  the  TD  factorial  program  from  Example 
10b.  The  second  constraint  function  is 
C2:  n>G,  z=n ,  t=n-l  ->  g(z,t)*n! 

The  control  data  t  being  bound  to  n-i  suggests  REWRITTI'70  nl  as 
n*  (r.-i)  J .  This  leads  to  the  correct  general  loop  function.  On 
the  other  hand,  consider  the  second  constraint  function  for  the 
T'J  factorial  program  from  Example  ICa: 

Z 2:  n>0,  z*l,  t=l  ->  g(z,t,n)=n! 

How  can  the  expression  n!  he  rewritten  in  terms  of  1,  1  and  r.7 

To  obtain  the  correct  general  function,  the  expression  would  lave 
to  be  rewritten  as  (l*n!)/(ll)  which  seems  much  ?ess  intuitive 
than  that  required  for  the  T^  version.  As  another  point  of  com¬ 
parison,  consider  the  second  constraint  function  for  the  CT 
exponentiation  program  1  from  Cxnmp’e  11: 

C2:  y>C ,  t*y-l  ->  g  {*?,  t  ,x)  *xAy 
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and  the  second  constraint  function  for  the  BU  exponentiation  pro¬ 
gram  A  from  the  same  examples 

C2:  y>0,  w=x ,  t*l  ->  g (w,t,x,y) =x*y . 

In  both  cases,  the  proper  loop  function  may  be  obtained  by  using 
the  REV7RITE  rule  xAy  *  x*  (x*  (y-1) ) ;  however,  this  particular  rule 
seems  more  strongly  suggested  in  the  constraint  function  for  the 
TD  program. 


I7e  remark  that  the  same  general  phenomenon  occurs  with  TD 
programs  in  the  event  the  control  data  has  been  SIMPLIFIED  out  of 
the  domain  requirement  for  C2.  In  this  case,  the  fact  that  the 
control  data  represents  a  slightly  less  complex  instance  of  the 
general  problem  being  solved  manifests  itself  in  the  function 
expression  for  the  SIMPLIFIED  C2  being  a  slightly  more  complex 
instance  of  the  problem  being  solved.  For  example,  the  con¬ 
straint  function  C2  above  for  the  TD  exponentiation  program  B  of 
Example  11  can  be  SIMPLIFIED  to 

t>=0 ,  v=x  ->  g  (v,t,:<)  =x~  (t+1)  . 

3efore,  the  appearance  of  y-1  in  the  domain  requirement  suggests  1 
rewritting  :<~v  as  x*  (x~  (y-1) )  .  Here,  the  appearance  of  t+1  in 
the  function  expression  suggests  rewritting  x* (t+1)  as  x* (xAt) 
(sae  also  Examples  1  and  2)  . 


Suppose  f  is  the  operation  or  function  the  initialized  loop 
program  is  intended  to  compute.  In  Section  3  we  observe!  that 
each  in  the  examples  of  that  section  involved  "decompos¬ 
ing"  *r.  application  of  f.  This  decomposition  corresponds  to 
rewritting  that  problem  instance  in  terms  of  a  slightly  Tess 
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complex  problem  instance  (or  instances) .  In  general,  of  course, 
there  are  many  ways  this  decomposition  can  be  performed.  In  the 
examples  of  that  section,  however,  as  with  all  TD  programs,  the 
nature  of  the  control  data  guides  this  decomposition  and  thus 
tends  to  make  the  REWRITE  step  quite  straightforward  in  practice. 

The  reader  may  have  noticed  that  the  general  loop  functions 
for  the  3U  factorial  and  exponentiation  programs  contain  more 
program  variables  and  operations  on  those  variables  than  their  TD 
counterparts.  For  instance,  the  general  loop  functions  for  the 
3U  and  TD  factorial  programs  are 

0<=t<=n  ->  g  (z,t,n) =z* (n!/tl ) 

and 

0<  =  t  ->  g  (z,t) =z*t! 

respectively.  This  fact,  by  itself,  helps  explain  why  the 
REWRITE  step  seems  more  difficult  for  the  2U  programs.  It  would 
be  a  mistake,  however,  to  assume  that  the  3U  programs  are  more 
"complex”  or  are  more  difficult  to  analyze  or  prove.  We  consider 
TD  loops  to  oe  somewhat  more  susceptible  to  the  form  of  induction 
employed  in  functional  loop  verification.  More  precisely,  the 
inductive  hypothesis  required  in  this  type  of  proof  (i.s.  a  gen¬ 
eral  statement  concerning  the  loop  inout/outout  behavior)  seems 
to  be  more  easily  stated  for  TD  programs  than  for  BU  programs. 
?n  the  other  hand,  DU  programs  seem  somewhat  more  susceptible  tc 
an  inductive  assertion  proof.  The  inductive  hypothesis  require  l 
in  this  type  r?  proof  (i.o.  a  sufficiently  strong  loop  invariant) 
involves  fewer  program  variables  end  operations  on  those  vari- 
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ables  than  the  same  type  of  hypothesis  for  the  corresponding  TD 
loop.  As  an  example,  the  BU  and  TD  factorial  programs  have  ade¬ 
quate  loop  invariants  0<=t  &  z=tl  and  0<=t<=n  &  z=nl/t!  respec¬ 
tively. 

In  [Manna  &  Waldinger  70] ,  the  authors  describe  a  program 
synthesis  technique  and  point  out  that  their  method  produces 
either  of  the  above  factorial  programs  depending  upon  which  type 
induction  rule  the  synthesizer  is  given  to  employ. 

7.  Related  Work 

In  [Basu  &  Misra  7G,  Misra  73,  Misra  79],  the  authors 
describe  two  classes  of  "naturally  provable"  programs  for  which 
generalized  loop  specifications  can  be  obtained  in  a  determinis¬ 
tic  manner.  Our  technique  sacrifices  determinism  in  favor  of 
wide  applicability  and  ease  of  use.  It  handles  in  a  fairly 
str aightforward  manner  typical  programs  in  these  two  program 
classes  (e.g.  Examples  1-3)  as  well  as  a  number  of  programs 
which  do  not  fit  in  either  of  the  classes  (e.g.  Examples  4-5). 

Due  to  the  close  relationship  between  loop  functions  and 
loop  invariants  (see,  for  example,  [Morris  &  "egbreit  77]),  any 
technique  for  synthesizing  loop  invariants  can  be  viewed  as  a 


technique  for  synthesizing  general  loop  functions  (and  vise 
versa).  In  this  light,  our  method  bears  an  interesting  rosem- 
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by  pushing  the  previous  approximation  back  through  the  loop  once, 
twice,  etc. 

By  way  of  illustration,  consider  the  exponentiation  program 
of  Example  2.  The  loop  exit  condition  can  be  used  to  obtain  an 
initial  loop  invariant  approximation 
d=0  ->  w=cO~dO . 

This  approximation  can  be  strengthened  by  pushing  it  back  through 
the  loop  to  yield 

(d  =  C  ->  w=c(TdO)  &  (d=l  ->  w*c=c(TdO)  . 

In  the  analysis  presented  in  Example  2,  we  obtained  a  value  for 
the  generalized  function  specification  for  each  of  two  different 
values  of  the  initialized  variable  w  (i.e.  1  and  S0?.T(c));  here 
we  have  obtained  a  "value"  for  the  loop  invariant  we  are  seeking 
for  each  of  two  different  values  of  the  variable  which  controls 
the  termination  of  the  loop  d.  Applying  the  analysis  in  [Morris 
■i  Uegbreit  77]  ,  these  loop  invariant  "values"  can  be  translate'''' 
to  constraint  functions  as  follows : 
d=0  ->  g(w,c,d)=w, 
d=l  ->  g (w,c,d) =w*c. 

Of  course,  the  function  expression  w*c  in  the  seccr.3  constraint 
can.  be  rewritten  w*  (c*l)  ;  SUBSTITUTING  as  usual  suggests  the  gen¬ 
eral  loop  function 

3  (v, c,d)  =v*  (cV.)  . 

If  ve  then  aid  the  program  precondition  as  a  domain  restriction 
on  this  function,  the  result  is  the  sane  general,  loop  function 
discovered  in  Example  2. 
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We  summarize  the  relationship  between  these  two  techniques 
as  follows.  As  the  initialized  loop  in  question  operates  on  some 
particular  input,  let  X[0],  X[l],  ..  ,X[N]  be  the  sequence  of 
states  on  which  the  loop  predicate  is  evaluated  (i.e.  the  loop 
body  executes  M-l  times).  Of  course,  in  X[0],  the  initialized 
variables  have  their  initialized  values,  and  in  X[N],  the  loop 
predicate  evaluates  to  FALSE.  The  method  proposed  in  this  paper 
suggests  inferring  the  unknown  loop  function  g  from  X[0],  X[l], 
g(X[0J)  and  g(X[l]).  The  loop  invariant  technique  described 
above,  when  viewed  as  a  loop  function  technique,  suggests  infer¬ 
ring  g  from  X[H],  Xl'.l-l],  g(X[N])  and  g(x[N-l]).  Speaking 
roughly  then,  one  technique  uses  the  first  several  executions  of 
the  loop,  the  other  uses  the  last  several  executions.  One 
ignores  the  information  that  the  loop  must  compute  the  identity 
function  on  inputs  where  the  loop  predicate  is  FALSE,  the  other 
ignores  the  information  that  the  loop  must  compute  like  the  ini¬ 
tialised  loop  when  initialized  variables  have  their  initialized 
values . 


Earlier  we  discussed  "top  down"  and  "bottom  up"  approaches 
to  synthesizing  g  and  indicated  that  our  technique  fit  in  the 
"top  down"  category.  The  technique  based  on  the  last  several 
iterations  is  a  "bottom  up"  approach.  It  is  difficult  to  care¬ 
fully  state  the  relative  merits  cf  these  two  opposing  techniques. 
In  our  view,  however,  there  are  a  number  of  circumstance*  urn  lor 
■.'hirh  the  technique  based  on  the  first  several  ^oop  executions 
seems  more  "natural"  and  easily  applied,  "’hese  examples  ioclulo 
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the  MODES  program,  the  program  to  compute  Ackermann's  function 
and  the  TD  factorial  program  discussed  above.  The  reason  is  that 
a  critical  aspect  of  the  general  loop  function  is  the  function 
computed  by  the  initialized  loop  program  (e.g.  exponentiation  in 
the  above  illustration) .  In  the  technique  based  on  the  first 
several  iterations,  this  function  appears  explicitly  in  the  con¬ 
straint  functions.  In  the  other  technique,  this  information  must 
somehow  be  inferred  from  the  corresponding  constraint  functions 
(e.g.  by  looking  for  a  pattern  in  these  functions,  etc.).  This 
difficulty  is  inherent  in  any  "bottom  up"  approach  to  synthesiz¬ 
ing  g. 


_8 .  Concluding  Remarks 

In  this  paper  we  have  proposed  a  technique  for  deriving 
functions  v/hich  describe  the  general  behavior  of  a  loop  which  is 
preceded  by  initialization.  These  functions  can  be  used  in  a 
functional  [Mills  75]  or  subgoal  induction  [Morris  &  Tegbreit  77] 
proof  of  correctness  of  the  initialized  loop  program.  It  is  not 
our  intention  to  imply  that  verification  should  occur  after  the 
programming  process  has  been  completed.  There  are,  however,  a 
large  number  of  existing  programs  which  must  be  read,  understood., 


modified  and. 

verified,  by 

"zia 

heuristic  as 

a  tool  which 

is 

It  has  been  argued  [:iisra  73]  that  the  notion  of  closure  of 
a  loop  with  respect  to  an  input  domain  is  fundamental  in  analyz¬ 
ing  the  loop.  In  Section  this  idea  is  applied  to  initialized. 
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loop  programs.  The  result  is  that  a  loop  function  g  for  a  loop 
which  is  N-closed  (for  some  N>0)  can  be  synthesized  in  a  deter¬ 
ministic  manner  by  considering  the  first  N  constraint  functions. 
Hence  this  categorization  can  be  viewed  as  one  measure  of  the 
"degree  of  difficulty"  involved  in  verifying  initialized  loop 
programs. 

An  interesting  direction  for  future  research  is  the  develop¬ 
ment  of  a  precise  characterization  of  programs  which  are  not 
"tricky"  (as  discussed  in  Section  5) .  Preliminary  results  along 
this  line  are  described  in  [Dunlop  &  Basil!  31]  (see  also  [Basu 
30])  . 


In  Section  5  we  discussed  on  an  informal  level  the  opposing 
3U  and  TD  problem  solving  strategies  and  their  corresponding  ini¬ 
tialized  loop  realizations.  h7e  argued  that  the  7D  approach 
appeared  to  be  more  widely  applicable  and  that,  in  practice,  TD 
programs  seem  to  occur  more  frequently.  hTe  explained  the  success 
of  the  proposed  loop  function  creation  technique  on  these  pro¬ 
grams  in  terms  of  an  easily  applied  REWRITE  step.  These  results 
are  offered  to  help  support  our  vie*'7  that  the  technique  may  be 
successfully  applied  in  a  wide  range  of  applications. 
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